Take-Home Exercise 4

Using R to visualise impact of Covid-19 on stock price of top 40 companies in Singapore

Caine Ng https://www.linkedin.com/in/caine-ng-069a5273/ (SMU - MITB)https://scis.smu.edu.sg/master-it-business
2022-02-26

1.0 Introduction

In this take home exercise the task will be to visualize the impact of Covid-19 on stock prices of top 40 companies in terms of market capitalization in Singapore. For this task stock prices between 01-January-2020 to 31-December 2021 will be used.

2.0 Challenges and considerations

The challenge of this exercise would be to represent the 40 stocks in a concise manner such that the 40 companies could be seen in one view, yet not hide patterns over the required time period.

3.0 Proposed solution and Sketch

The proposed solution is a horizon plot to show the top 40 stocks in terms of market capitalization. With the aim to visualize the effects of Covid-19 on stock prices, the colour scheme chosen such that falls in stock prices could be picked out by the reader.

Proposed Sketech.

4.0 Loading Required packages

For this task before visualization, the data would have to be scraped from a financial news portal. Thus, the following r packages are loaded:

  1. tidyquant: Tidyquant package is loaded to enable functions retrieve financial data from yahoo finance.
  2. rmarkdown Will be used to show the data in a paged table after wrangling to check the data.
  3. tidyverse: Tidyverse is loaded to enable functions for wrangling the data prior to visualization.
  4. ggHoriPlot: is used to build the horizon plots with ggplot2
  5. ggthemes Is loaded to enable wider range of themes to be used to visualize the data.
packages = c('tidyquant','rmarkdown','tidyverse','rmarkdown','ggHoriPlot', 'ggthemes')
for (p in packages) {
  if(!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

5.0 Data Scraping

To scrape the data from a financial new portal, the ticker symbol of each stock would be used to retrieve the specified stock data. The list of top stocks in terms of market capitalization is first downloaded from here. The file is stored as a csv file. In the code chunk below, read_csv function from tidyverse is used to read the data into tibble object name ‘stocks’. The spreadsheet is downloaded on 23-Feb-2022 and order of the stocks in terms of market capitalization may have changed since then.

stocks <- read_csv("data/companiesmarketcap.com - Largest companies in Singapore by market capitalization.csv")

The market spreadsheet contains information such as market cap, country of origin and price at the time of download. For scraping stock price information only the symbol of the stock is required. In the step below, the symbol column is selected out of the table. By default top 41 companies is downloaded in the list. The slice_head function is used to filter to only top 40. The output of the select function is a one column series object. The pull function is used to convert this into a vector such that it could be later using in a for loop to compile the stock data.

stock_list <-   stocks %>% select(Symbol) %>% slice_head(n = 40) %>% pull(Symbol)

The code chunk below is used to compile the stock data that is scraped from Yahoo finance website with functions from tidyquant package. A for loop is used to iteratively scrape each stock’s data using the ticker symbol.

To do so an empty dataframe is first initialized and in subsequent steps each retrieved stock information is appended to it to collate all the data. The time period for stock data is defined, using from_date and to_date variables. While the granularity of the data is defined by period_type. In this case ‘days’ is used.

Using a for loop, the symbol for each of the 40 stocks is iterated and used to retrieve stock data from Yahoo finance using tq_get function. The retrieve data is store temporarily in ‘stock_data_daily’ data table. To separate the appended data, the stock symbol is added to stock_data_daily before using a bind_rows function to append the data.

Learning from peers, it is possible to pass a vector of stock symbols to tq_get function to retrieve the data at one go. The use of a loop is not necessary.

stocks_data = data.frame()
from_date <-  "2020-01-01"
to_date <-  "2021-12-31"
period_type <-  "days"  

for (i in stock_list) {
  stock_selected <-  i 
  stock_data_daily <-  tq_get(stock_selected,
                            get = "stock.prices",
                            from = from_date,
                            to = to_date) %>% tq_transmute(select = NULL,
                                                           mutate_fun = to.period,
                                                           period  = period_type)
  stock_data_daily <- stock_data_daily %>% add_column(Symbol = i)
  stocks_data <- bind_rows(stocks_data, stock_data_daily)
}

To make the chart more readable, the retrieved price data is merged with the top 40 stock list to have full stock names in the code chuck below using inner_join function.

stocks_data_complete <- stocks_data %>% inner_join(stocks, c("Symbol" = "Symbol"))

Unused columns that are not required for the visualization is removed in the step below.

stocks_data_complete <- stocks_data_complete %>% select(!c(Rank, marketcap, `price (USD)`, country))

To preserve the order of the stocks in terms of market cap, the names of stock is converted from a character type to factor type column below.

stocks_data_complete$Name <-factor(stocks_data_complete$Name, levels = stocks$Name)

Lastly in the step below the summary function is used to inspect the stocks data. Referencing the length of symbol column is seems that some of the stocks may not have data over a full two year period. Calculating weekdays in a 2 year period for 40 stocks without subtracting public holidays should expect slightly less than 20800 rows of data (52 * 5 * 2 * 40). With a length of 18950 rows in the symbol column could suggest no data for some days of the year for some stocks. This could be inspected in the visualization.

summary(stocks_data_complete)
      date                 open              high        
 Min.   :2020-01-02   Min.   :  0.080   Min.   :  0.080  
 1st Qu.:2020-07-13   1st Qu.:  1.600   1st Qu.:  1.620  
 Median :2021-01-14   Median :  2.983   Median :  3.013  
 Mean   :2021-01-09   Mean   : 13.975   Mean   : 14.206  
 3rd Qu.:2021-07-13   3rd Qu.: 12.450   3rd Qu.: 12.648  
 Max.   :2021-12-30   Max.   :370.000   Max.   :372.700  
                                                         
      low             close            volume         
 Min.   :  0.08   Min.   :  0.08   Min.   :        0  
 1st Qu.:  1.58   1st Qu.:  1.60   1st Qu.:    36925  
 Median :  2.96   Median :  2.99   Median :  2177650  
 Mean   : 13.72   Mean   : 13.96   Mean   :  5498903  
 3rd Qu.: 12.25   3rd Qu.: 12.42   3rd Qu.:  7487182  
 Max.   :360.52   Max.   :366.99   Max.   :314380983  
                                                      
    adjusted           Symbol                      Name      
 Min.   :  0.0678   Length:18950       ST Engineering:  509  
 1st Qu.:  1.5432   Class :character   Sea (Garena)  :  504  
 Median :  2.8600   Mode  :character   DBS           :  504  
 Mean   : 13.6543                      OCBC Bank     :  504  
 3rd Qu.: 12.1064                      UOB           :  504  
 Max.   :366.9900                      Singtel       :  504  
                                       (Other)       :15921  

6.0 Generating the Visualization

The data is visualized using geom_horizon function in the code chunk below with following specified options with reference from the sample site:

stocks_data_complete %>%  
  ggplot() +
  geom_horizon(aes(date, 
                   open), origin = 'min') +
  scale_fill_hcl(palette = 'RdBu', reverse = F) +
  facet_grid(Name~.) +
  theme_few() +
  theme(
    panel.spacing.y=unit(0, "lines"),
    strip.text.y = element_text(size = 7, angle = 0, hjust = 0),
    legend.position = 'none',
    axis.text.y = element_blank(),
    axis.title.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.text.x= element_text(size = 6, angle = 0, hjust = 0.5),
    panel.border = element_blank()
    ) +
  scale_x_date(expand=c(0,0), date_breaks = "1 month", date_labels = "%b-%y") +
  ggtitle('Open price for top 40 companies in terms of Martket Capitalization \nFrom Jan-2020 to 31-Dec-2021') +
  xlab('Month-Year')

7.0 Conclusion and Takeaways

The final plot shows the day opening price of top 40 companies over a two year period from Jan-2020 to Dec-21. Rows are arranged in terms of their market capitalization.

Scanning across all the rows two distinct points could be observed to have sharp drop in stock price across numerous of the stocks, late Mar-2020 and late Oct-2020. The worse of these periods was in March 2020 which could coinciding with general global market sentiment attributed to Covid-19 panic as numerous countries enter lock down mode and locally progressively stricter measures were implemented. Affecting almost all stocks in the list. However it seem most companies recovered shortly after in April and did not see sustain impact to stock prices.

No obvious relation to pandemic events occurring in late October were found. Stocks that were mainly affected were banks (OCBC, UOB), property developers (City Developments, UOL group) and general congolmerate Jardine Singapore.

Mentioned in earlier section some stocks did not have full two year period data. These companies are Grab Holdings, TDCX, Karooooo and Triterras and are stocks that were recently listed on various stock exchange in previous two year period.